Goto

Collaborating Authors

 articulated object


Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects

Neural Information Processing Systems

Articulated object manipulation is a fundamental yet challenging task in robotics. Due to significant geometric and semantic variations across object categories, previous manipulation models struggle to generalize to novel categories. Few-shot learning is a promising solution for alleviating this issue by allowing robots to perform a few interactions with unseen objects.


sim2art: Accurate Articulated Object Modeling from a Single Video using Synthetic Training Data Only

Artykov, Arslan, Sautier, Corentin, Lepetit, Vincent

arXiv.org Artificial Intelligence

Understanding articulated objects is a fundamental challenge in robotics and digital twin creation. T o effectively model such objects, it is essential to recover both part segmentation and the underlying joint parameters. Despite the importance of this task, previous work has largely focused on setups like multi-view systems, object scanning, or static cameras. In this paper, we present the first data-driven approach that jointly predicts part segmentation and joint parameters from monocular video captured with a freely moving camera. Trained solely on synthetic data, our method demonstrates strong generalization to real-world objects, offering a scalable and practical solution for articulated object understanding. Our approach operates directly on casually recorded video, making it suitable for real-time applications in dynamic environments.


Kinematic Kitbashing for Modeling Functional Articulated Objects

Guo, Minghao, Zordan, Victor, Andrews, Sheldon, Matusik, Wojciech, Agrawala, Maneesh, Liu, Hsueh-Ti Derek

arXiv.org Artificial Intelligence

We introduce Kinematic Kitbashing, an automatic framework that synthesizes functionality-aware articulated objects by reusing parts from existing models. Given a kinematic graph with a small collection of articulated parts, our optimizer jointly solves for the spatial placement of every part so that (i) attachments remain geometrically sound over the entire range of motion and (ii) the assembled object satisfies user-specified functional goals such as collision-free actuation, reachability, or trajectory following. At its core is a kinematics-aware attachment energy that aligns vector distance function features sampled across multiple articulation snapshots. We embed this attachment term within an annealed Riemannian Langevin dynamics sampler that treats functionality objectives as additional energies, enabling robust global exploration while accommodating non-differentiable functionality objectives and constraints. Our framework produces a wide spectrum of assembled articulated shapes, from trash-can wheels grafted onto car bodies to multi-segment lamps, gear-driven paddlers, and reconfigurable furniture, and delivers strong quantitative improvements over state-of-the-art baselines across geometric, kinematic, and functional metrics. By tightly coupling articulation-aware geometry matching with functionality-driven optimization, Kinematic Kitbashing bridges part-based shape modeling and functional assembly design, empowering rapid creation of interactive articulated assets.


Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects

Neural Information Processing Systems

Articulated object manipulation is a fundamental yet challenging task in robotics. Due to significant geometric and semantic variations across object categories, previous manipulation models struggle to generalize to novel categories. Few-shot learning is a promising solution for alleviating this issue by allowing robots to perform a few interactions with unseen objects. Recognizing this limitation, we observe that despite their distinct shapes, different categories often share similar local geometries essential for manipulation, such as pullable handles and graspable edges - a factor typically underutilized in previous few-shot learning works. To harness this commonality, we introduce'Where2Explore', an affordance learning framework that effectively explores novel categories with minimal interactions on a limited number of instances.


GAMMA: Generalizable Articulation Modeling and Manipulation for Articulated Objects

Yu, Qiaojun, Wang, Junbo, Liu, Wenhai, Hao, Ce, Liu, Liu, Shao, Lin, Wang, Weiming, Lu, Cewu

arXiv.org Artificial Intelligence

Articulated objects like cabinets and doors are widespread in daily life. However, directly manipulating 3D articulated objects is challenging because they have diverse geometrical shapes, semantic categories, and kinetic constraints. Prior works mostly focused on recognizing and manipulating articulated objects with specific joint types. They can either estimate the joint parameters or distinguish suitable grasp poses to facilitate trajectory planning. Although these approaches have succeeded in certain types of articulated objects, they lack generalizability to unseen objects, which significantly impedes their application in broader scenarios. In this paper, we propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA), which learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories. In addition, GAMMA adopts adaptive manipulation to iteratively reduce the modeling errors and enhance manipulation performance. We train GAMMA with the PartNet-Mobility dataset and evaluate with comprehensive experiments in SAPIEN simulation and real-world Franka robot. Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects. We will open-source all codes and datasets in both simulation and real robots for reproduction in the final version. Images and videos are published on the project website at: http://sites.google.com/view/gamma-articulation


Online Estimation of Articulated Objects with Factor Graphs using Vision and Proprioceptive Sensing

Buchanan, Russell, Röfer, Adrian, Moura, João, Valada, Abhinav, Vijayakumar, Sethu

arXiv.org Artificial Intelligence

From dishwashers to cabinets, humans interact with articulated objects every day, and for a robot to assist in common manipulation tasks, it must learn a representation of articulation. Recent deep learning learning methods can provide powerful vision-based priors on the affordance of articulated objects from previous, possibly simulated, experiences. In contrast, many works estimate articulation by observing the object in motion, requiring the robot to already be interacting with the object. In this work, we propose to use the best of both worlds by introducing an online estimation method that merges vision-based affordance predictions from a neural network with interactive kinematic sensing in an analytical model. Our work has the benefit of using vision to predict an articulation model before touching the object, while also being able to update the model quickly from kinematic sensing during the interaction. In this paper, we implement a full system using shared autonomy for robotic opening of articulated objects, in particular objects in which the articulation is not apparent from vision alone. We implemented our system on a real robot and performed several autonomous closed-loop experiments in which the robot had to open a door with unknown joint while estimating the articulation online. Our system achieved an 80% success rate for autonomous opening of unknown articulated objects.


Learning Agent-Aware Affordances for Closed-Loop Interaction with Articulated Objects

Schiavi, Giulio, Wulkop, Paula, Rizzi, Giuseppe, Ott, Lionel, Siegwart, Roland, Chung, Jen Jen

arXiv.org Artificial Intelligence

Interactions with articulated objects are a challenging but important task for mobile robots. To tackle this challenge, we propose a novel closed-loop control pipeline, which integrates manipulation priors from affordance estimation with sampling-based whole-body control. We introduce the concept of agent-aware affordances which fully reflect the agent's capabilities and embodiment and we show that they outperform their state-of-the-art counterparts which are only conditioned on the end-effector geometry. Additionally, closed-loop affordance inference is found to allow the agent to divide a task into multiple non-continuous motions and recover from failure and unexpected states. Finally, the pipeline is able to perform long-horizon mobile manipulation tasks, i.e. opening and closing an oven, in the real world with high success rates (opening: 71%, closing: 72%).


Research Highlight: Enabling Robot Interaction With Articulated Objects

CMU School of Computer Science

Research from Carnegie Mellon University's Robotics Institute could one day allow robots to seamlessly open drawers, doors and lids on hinges. While humans interact with various articulated objects daily -- opening a refrigerator door or lifting a toilet seat are good examples -- these tasks present a challenge in robotics. Ben Eisner and Harry Zhang, both graduate students in Assistant Professor David Held's Robots Perceiving and Doing Lab, designed a new way to train robots to perceive and manipulate articulated objects in their project, "FlowBot3D: Learning 3D Articulation Flow To Manipulate Articulated Objects." The team presented their research at Robotics: Science and Systems this year, where it was a finalist for a best paper award. FlowBot3D uses a vision-based system to help robots learn how to interact with many different kinds of articulated objects.


Manipulation of Articulated Objects using Dual-arm Robots via Answer Set Programming

Bertolucci, Riccardo, Capitanelli, Alessio, Dodaro, Carmine, Leone, Nicola, Maratea, Marco, Mastrogiovanni, Fulvio, Vallati, Mauro

arXiv.org Artificial Intelligence

The manipulation of articulated objects is of primary importance in Robotics, and can be considered as one of the most complex manipulation tasks. Traditionally, this problem has been tackled by developing ad-hoc approaches, which lack flexibility and portability. In this paper we present a framework based on Answer Set Programming (ASP) for the automated manipulation of articulated objects in a robot control architecture. In particular, ASP is employed for representing the configuration of the articulated object, for checking the consistency of such representation in the knowledge base, and for generating the sequence of manipulation actions. The framework is exemplified and validated on the Baxter dual-arm manipulator in a first, simple scenario. Then, we extend such scenario to improve the overall setup accuracy, and to introduce a few constraints in robot actions execution to enforce their feasibility. The extended scenario entails a high number of possible actions that can be fruitfully combined together. Therefore, we exploit macro actions from automated planning in order to provide more effective plans. We validate the overall framework in the extended scenario, thereby confirming the applicability of ASP also in more realistic Robotics settings, and showing the usefulness of macro actions for the robot-based manipulation of articulated objects.


Estimating Mass Distribution of Articulated Objects through Physical Interaction

Kannabiran, Niranjan Kumar, Essa, Irfan, Liu, C. Karen

arXiv.org Artificial Intelligence

We explore the problem of estimating the mass distribution of an articulated object by an interactive agent. Our method predicts the mass distribution accurately only using information that can be reliably acquired by the limited sensing and actuating capabilities of a robotic agent that interacts with it. Inspired by the role of exploratory play in human infants, we take the combined approach of supervised and reinforcement learning to train the agent such that it learns to strategically interact with the object for estimating its mass distribution. Our method consists of two neural networks: the policy network which decides how to interact with the object, and the predictor network that estimates the mass distribution given a history of observations and interactions. Using our method, we train a robotic arm to estimate the mass distribution of an object with moving parts (e.g. an articulated rigid body system) by pushing it on a surface with unknown friction properties. We also test the robustness of our learned model by transferring it to another robot arm with different end-effector geometry. The empirical results show that our method significantly outperforms the baseline agent which uses random pushes to collect data for estimation.